30 research outputs found

    Stochastic Query Covering for Fast Approximate Document Retrieval

    Get PDF
    We design algorithms that, given a collection of documents and a distribution over user queries, return a small subset of the document collection in such a way that we can efficiently provide high-quality answers to user queries using only the selected subset. This approach has applications when space is a constraint or when the query-processing time increases significantly with the size of the collection. We study our algorithms through the lens of stochastic analysis and prove that even though they use only a small fraction of the entire collection, they can provide answers to most user queries, achieving a performance close to the optimal. To complement our theoretical findings, we experimentally show the versatility of our approach by considering two important cases in the context of Web search. In the first case, we favor the retrieval of documents that are relevant to the query, whereas in the second case we aim for document diversification. Both the theoretical and the experimental analysis provide strong evidence of the potential value of query covering in diverse application scenarios

    Blake, Charles

    Get PDF
    Caching search results is employed in information retrieval systems to expedite query processing and reduce back-end server workload. Motivated by the observation that queries belonging to different topics have different temporal-locality patterns, we investigate a novel caching model called STD (Static-Topic-Dynamic cache). It improves traditional SDC (Static-Dynamic Cache) that stores in a static cache the results of popular queries and manages the dynamic cache with a replacement policy for intercepting the temporal variations in the query stream. Our proposed caching scheme includes another layer for topic-based caching, where the entries are allocated to different topics (e.g., weather, education). The results of queries characterized by a topic are kept in the fraction of the cache dedicated to it. This permits to adapt the cache-space utilization to the temporal locality of the various topics and reduces cache misses due to those queries that are neither sufficiently popular to be in the static portion nor requested within short-time intervals to be in the dynamic portion. We simulate different configurations for STD using two real-world query streams. Experiments demonstrate that our approach outperforms SDC with an increase up to 3% in terms of hit rates, and up to 36% of gap reduction w.r.t. SDC from the theoretical optimal caching algorithm

    Caching Historical Embeddings in Conversational Search

    Full text link
    Rapid response, namely low latency, is fundamental in search applications; it is particularly so in interactive search sessions, such as those encountered in conversational settings. An observation with a potential to reduce latency asserts that conversational queries exhibit a temporal locality in the lists of documents retrieved. Motivated by this observation, we propose and evaluate a client-side document embedding cache, improving the responsiveness of conversational search systems. By leveraging state-of-the-art dense retrieval models to abstract document and query semantics, we cache the embeddings of documents retrieved for a topic introduced in the conversation, as they are likely relevant to successive queries. Our document embedding cache implements an efficient metric index, answering nearest-neighbor similarity queries by estimating the approximate result sets returned. We demonstrate the efficiency achieved using our cache via reproducible experiments based on TREC CAsT datasets, achieving a hit rate of up to 75% without degrading answer quality. Our achieved high cache hit rates significantly improve the responsiveness of conversational systems while likewise reducing the number of queries managed on the search back-end

    How future surgery will benefit from SARS-COV-2-related measures: a SPIGC survey conveying the perspective of Italian surgeons

    Get PDF
    COVID-19 negatively affected surgical activity, but the potential benefits resulting from adopted measures remain unclear. The aim of this study was to evaluate the change in surgical activity and potential benefit from COVID-19 measures in perspective of Italian surgeons on behalf of SPIGC. A nationwide online survey on surgical practice before, during, and after COVID-19 pandemic was conducted in March-April 2022 (NCT:05323851). Effects of COVID-19 hospital-related measures on surgical patients' management and personal professional development across surgical specialties were explored. Data on demographics, pre-operative/peri-operative/post-operative management, and professional development were collected. Outcomes were matched with the corresponding volume. Four hundred and seventy-three respondents were included in final analysis across 14 surgical specialties. Since SARS-CoV-2 pandemic, application of telematic consultations (4.1% vs. 21.6%; p < 0.0001) and diagnostic evaluations (16.4% vs. 42.2%; p < 0.0001) increased. Elective surgical activities significantly reduced and surgeons opted more frequently for conservative management with a possible indication for elective (26.3% vs. 35.7%; p < 0.0001) or urgent (20.4% vs. 38.5%; p < 0.0001) surgery. All new COVID-related measures are perceived to be maintained in the future. Surgeons' personal education online increased from 12.6% (pre-COVID) to 86.6% (post-COVID; p < 0.0001). Online educational activities are considered a beneficial effect from COVID pandemic (56.4%). COVID-19 had a great impact on surgical specialties, with significant reduction of operation volume. However, some forced changes turned out to be benefits. Isolation measures pushed the use of telemedicine and telemetric devices for outpatient practice and favored communication for educational purposes and surgeon-patient/family communication. From the Italian surgeons' perspective, COVID-related measures will continue to influence future surgical clinical practice

    USI Participation at SMERP 2017 Text Summarization Task

    No full text
    Abstract. This short report describes the participation of the UniversitĂ  della Svizzera italiana (USI) at the SMERP Workshop Data Challenge Track for the task text summarization of Level 1. Our participation is based on a linear interpolation for combining relevance and novelty scores of the retrieved tweets. Our method is fully automatic. For the relevance score we used the results from our runs at the text retrieval task whereas for the novelty we used a method based on Word2Vec. In total, we submitted four different runs and we used two different weight parameters. The results showed that when relevance and novelty have an equal contribution in selecting the tweets to use for the summary, the performance is better compared to favoring only the novelty. Additionally, information from POS tags improves the performance of the summarization task

    Cache Optimization Via Topics in Web Search Engines

    No full text
    Embodiments may provide a cache for query results that can adapt the cache-space utilization to the popularity of the various topics represented in the query stream. For example, a method for query processing may perform receiving a plurality of queries for data and requesting data responsive to at least one query from a data cache comprising a temporal cache, wherein the temporal cache is configured to store data based on a topic associated with the data and is configured to retrieve data based on a topic, and wherein the data cache is configured to retrieve data responsive to at least one query from the computer system

    Emotional Influence Prediction of News Posts

    No full text
    Nowadays, on-line news agents post news articles on social media platforms with the aim to spread information as well as to attract more users and understand their reactions and opinions. Predicting the emotional influence of news on users is very important not only for news agents but also for users, who can filter out news articles based on the reactions they trigger. In this paper, we focus on the problem of emotional influence prediction of a news post on users before publication. For the prediction, we explore a range of textual and semantic features derived from the content of the posts. Our results show that terms is the most important feature and that features extracted from news posts' content allow to effectively predict the amount of emotional reactions triggered by a news post
    corecore